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Abstract Novel auction schemes are constantly being designed. Their 
design has significant consequences for the allocation of goods and the 
revenues generated. But how to tell whether a new design has the desired 
properties, such as efficiency, i.e. allocating goods to those bidders who 
value them most? We say: by formal, machine-checked proofs. We invest- 
igated the suitability of the Isabelle, Theorema, Mizar, and Hets/CASL/ 
TPTP theorem provers for reproducing a key result of auction theory: 
Vickrey's I96I theorem on the properties of second-price auctions. Based 
on our formalisation experience, taking an auction designer's perspective, 
we give recommendations on what system to use for formalising auctions, 
and outline further steps towards a complete auction theory toolbox. 



1 Motivation: Why Formalise Auction Theory? 

Auctions are a widely used mechanism for allocating goods and services'", per- 
haps second in importance only to markets. They are used to allocate electro- 
magnetic spectrum, airplane landing slots, oil fields, bankrupt firms, works of 
art, eBay items, and to establish exchange rates, treasury bill yields, and stock 
exchange opening prices. Novel auction schemes are constantly being designed, 
aiming to maximise the auctioneer's revenue, foster competition in subsequent 
markets, and to efficiently allocate resources. 

Auction design can have significant consequences. Klemperer attributed the 
low revenues gained in some government auctions of the 3G radio spectrum in 
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2000 (€20 per capita vs. €600 in other countries) to bad design [18]. Design prac- 
tice outstrips theory, especially for complex modern auctions such as combinat- 
orial ones, which accept bids on subsets of items (e.g. collections of spectrum). 
Designing a revenue-maximising auction is TVP-complete [6] even with a single 
bidder. Important auctions often run 'in the wild' with few formal results [19]. 
We aim at convincing auction designers that investing into formalisation pays 
off with machine-checked proofs and a deeper understanding of the theory. To 
this end, we want to provide them with a toolbox of basic auction theory formal- 
isations, on top of which they can formalise and verify their own auction designs 
- which typically combine standard building blocks, e.g. an ascending auction 
converting to a sealed-bid auction when the number of remaining bidders equals 
the number of items available. Given the ubiquity of specialist support across a 
range of service sectors, we conjecture that auction designers might be suppor- 
ted by formalisation experts, creating a niche for specially trained experts at the 
interface of the core mechanised reasoning community and auction designers. 

Our ForMaRE project (formal mat hematical reasoning in economics [22]) 
seeks to increase confidence in economics' theoretical results, to aid in discover- 
ing new results, and to foster interest in formal methods within economics. To 
formal methods, we seek to contribute new challenge problems and user exper- 
ience feedback from new audiences. Auctions are representative of practicaUy 
relevant fields of economics that have hardly been formalised so far.^ Economics 
has been formalised before [15], particularly social choice theory (cf. §5 and [10]) 
and game theory (cf. [37] and our own work [16]). However, none of these formal- 
isations involved economists. Formalising (mathematical) theories and applying 
mechanised reasoning tools remain novel to economics.^ 

§2 establishes requirements for the Auction Theory Toolbox (ATT); §3 ex- 
plains our approach to building it. §4 is our main contribution: a qualitative 
comparison of how well four different theorem provers satisfy our requirements. 
§5 reviews related work, and §6 concludes and provides an outlook. 

2 Requirements for an Auction Theory Toolbox 

Conversations with auction designers established ATT requirements as follows: 

Dl Formalise ready-to- use basic auction concepts, including their definitions and 
essential properties. 

D2 Allow for extension and application to custom-designed auctions without 
requiring expert knowledge of the underlying mechanised reasoning system. 

From a computer scientist's technical perspective, these translate to: 

* Even code verification is typically not considered, although Leese, who worked on the 
UK's spectrum auctions, has called for auction software to be added to the Verified 
Software Repository at http://vsr.sourceforge.net [47]. 

^ There is a field 'computational economies'; however, it is mainly concerned with the 
numerical computation of solutions or simulations (cf., e.g., [13]). 



CI Identify the right language to formahse auction theory. This language should 
(a) be sufficiently expressive for concisely capturing complex concepts, while 
supporting efficient proofs for the majority of problems, (b) be learnable for 
economists used to mathematical textbook notation, and (c) provide libraries 
of the mathematical foundations underlying auctions. 

C2 Identify a mechanised reasoning system (a) that assists with cost-effective 
development of formalisations, (b) that facilitates reuse of formalisations 
already existing in the toolbox, (c) that creates comprehensible output to 
help users understand, e.g., why a proof attempt failed, or what knowledge 
was used in proving a goal, and (d) whose community is supportive towards 
users with little specific technical and theoretical background. 

Note the conflicts of interest: a single language might not meet requirement Cla, 
and if it did, it might not be supported by a user-friendly system. 

3 Approach to Building the Auction Theory Toolbox 

To avoid a chicken-and-egg problem, we identify relevant domain problems in 
parallel to identifying languages and systems suitable for formalisation. 

3.1 The Domain Problem: Vickrey's Theorem and Beyond 

We started with Vickrey's 1961 theorem on the properties of second-price auc- 
tions of a single, indivisible good, whose bidders' private values are not publicly 
known. Each participant submits a sealed bid; one of the highest bidders wins, 
and pays the highest remaining bid; the losers pay nothing. Vickrey proved that 
'truth-telling' - submitting a bid equal to one's actual valuation of the good - 
was a weakly dominant strategy, i.e. that no bidder can do strictly better by 
bidding above or below their valuation whatever the other bidders do. Thus, the 
auction is also efficient, allocating the item to the bidder with the highest valu- 
ation. Bidders only have to know their own valuations; in particular they need no 
information about others' valuations or the distributions these are drawn from. 

As variants of Vickrey auctions are widely used (e.g. by eBay, Google and Ya- 
hoo! [45]), this formalisation will enable us to prove properties of contemporary 
auctions as well. The underlying theory is straightforward to understand even 
for non-economists and can be formalised with reasonable effort. Finally, form- 
alising Vickrey provides a good introduction for domain experts to mechanised 
reasoning technology by serving as a small, self-contained showcase of a widely 
known result, helping to build trust in this new technology. 

Maskin collected 13 theorems, including Vickrey's, in a review [24] of an 
influential auction theory textbook [25]. This sets the roadmap for building the 
ATT - a collaborative effort, to which we welcome community contributions [23]. 

3.2 Paper Elaboration to Prepare the Machine Formalisation 

To prepare the machine formalisation, we refined the original paper source, aware 
that current mechanised reasoning systems typically require much more explicit 



statements than commonly found on paper: automated provers must find proofs 
without running out of search space, whereas proof checkers require proofs at a 
certain level of detail, which in turn requires detailed statements. Maskin states 
Vickrey's theorem in two sentences and proves it in another six sentences [24, 
Proposition 1].^" Our elaboration uses eight definitions specific to the domain 
problem plus an auxiliary one about maximum components of vectors, as follows: 
N = {!,...,«} is a set of participants, often indexed by i. An allocation is a 
vector X € {0, 1}" where Xj = 1 denotes participant i's award of the indivisible 
good to be auctioned (i.e. 'i wins'), and xj = otherwise. An outcome {x,p) spe- 
cifies an allocation and a vector of payments, p G R", made by each participant 
i. Participant i's payoff is Ui = ViXi — pi, where Vi e R+ is i's valuation of the 
good. A strategy profile is a vector b E R", where 6^ > is called i's hid,}^ For 
an n-vector y = (yi, ...,?;„) e R", let y = maxjgjv Vj and y_i = maxjg7v\{-i} yj. 

Definition 1 (Second-Price Auction). Given M = {i £ N : bi = b}, a second- 
price auction is an outcome {x,p) satisfying: 

1. Vj e N\M, Xj = pj = 0; and 

2. for one^^ i G M , = 1 and pi — b-i, while, Vj € M\ {i} , xj = Pj = 0. 

Definition 2 (Efficiency). An efficient auction maximises "^i^j^ViXi for a 
given v, i.e., for a single good, Xi = 1 ^ Vi =v. 

Definition 3 (Weakly Dominant Strategy) . Given some auction, a strategy 
profile h supports an equilibrium in weakly dominant strategies if, for each i G N 
and any S e R" with bi ^ bi, Ui {bi, . . . , Sj_i, . . . , > u, }^ I.e., 

whatever others do, i will not be better off by deviating from the original bid b^. 

Theorem 1 (Vickrey 1961; Milgrom 2.1). In a second-price auction, the 

strategy profile b = v supports an equilibrium in weakly dominant strategies. 
Furthermore, the auction is efficient. 

The attempt to be close to a paper formalisation may introduce artefacts 
that unnecessarily complicate machine formalisation. E.g., the contiguous nu- 
meric participant indexing is merely a convention: formally any relation between 
participants' valuation, bid, allocation, and payment vectors suffices. Similarly, 
the product ViXi recalls the general divisible good case {xi € [0, 1]) and works 
around the lack of an easy and compact 'if-then-else' textbook notation.^'' 

^° The high level of Maskin's text is owed to its summative nature. Original proofs in 
auction theory are typically more thorough. 

This simplification is sufficient for proving the theorem. More precisely, all parti- 
cipants know that each vi is an independent realisation of a random variable with dis- 
tribution density /. A participant's strategy is a mapping gi such that bi = gi {vi, /). 
When running an auction in practice, this i may be selected randomly, but this 
circumstance does not matter for the proof of Vickrey's theorem. 
The notation m (6) is standard in economics but formally misleading. A more careful 
notation is w, {xi, Vi,pi), where Xi and pi depend on b and the auction type. 
Case distinctions with curly braces consume at least two lines. 



Proof. Suppose participant i bids bi = Vi, whatever bj the others bid. Let S''*"" 
abbreviate the overall vector (61, . . . ,bi-i,Vi,bi-^-i, . . . There are twocases^^: 

1. i wins. This implies bi = Vi = Pi = S'^^^-i, and Ui{b'^^'") = Vi — Pi = 
^i<-t) _ _^ > Q Now consider i submitting an arbitrary bid Sj ^ bi, i.e. 
assume an overall bid vector b. This has two sub-cases: 

(a) i wins with the other bid, i.e. Ui{b) = Ui{b^^^), as the second highest 
bid has not changed. 

(b) i loses with the other bid, i.e. Ui{b) = < Mi(b'^"). 

2. i loses. This implies pi — 0, Ui{b'-^'") — 0, and hi < b^^'"_i; otherwise i 
would have won. This yields again two cases for i's alternative bid : 

(a) i wins, i.e. u,{b) = ~ b ^ = b^'^^'-i < = Ui{b'^''). 

(b) i loses, i.e. Ui{b) = = 

By analogy for all i,b = v supports an equilibrium in weakly dominant strategies. 
Efficiency is immediate: the highest bidder has the highest valuation. □ 



3.3 Choosing Language and System 

In terms of logic, it is not immediately obvious whether Vickrey's theorem is in- 
herently higher-order. Defining the maximum operator on arbitrarily sized finite 
sets of real- valued bids and proving its essential properties requires induction 
and thus exceeds first-order logic (FOL): similarly for the finiteness of a set^^ 
and a formalisation of real numbers. However, if one takes real vectors and a 
maximum operation on them for granted, and explicitly requires the maximum 
to exist, FOL suffices to formalise the relevant domain concepts: single good 
auctions, second-price auctions, and the theorem statement. 

In terms of syntax, we assume that auction designers will prefer a language 
that is close to the textbook mathematics they are used to, rather than having 
a programming language flavour. We assume that at least optional type an- 
notations support intuitive modelling of domain concepts (e.g. an auction as a 
function that takes bids and returns an allocation and payments) and prevent 
formalisation mistakes by cheap early checks (cf. [21]). 

In terms of user experience, we study two paradigms: automated provers try, 
given a theorem and a knowledge base, to automatically find a proof, potentially 
appealing to our audience if the user just has to push a button (as with model 
checkers). Interactive provers interactively check a proof written by the user, 
which may be convenient when a paper proof already exists. 

Our initial elaboration of Maskin's proof, which distinguishes cases on the basis of 
participants' bids, resulted in nine leaf cases. Straiglitforward on paper, we found 
them tedious to formalise in Isabella, which triggered the rearrangement shown here. 

^® Finiteness matters: the set {6i = 1 — i : z = 1, 2, 3, . . . } has no maximum. 

^'^ Real numbers are not usually required for running auctions in practice. Even financial 
exchanges that allow 'sub-pennying' have a minimal discrete quantum of currency. 
For instance, our Mizar proof never invokes any second-order scheme directly. Two 
proof steps use the fact that a finite set of numbers includes its majcimum, which is 
proved in the Mizar Mathematical Library (MML) using the induction scheme. 



4 Qualitative Comparison of the Languages and Systems 



We have formalised Vickrey's theorem in four systems, which differ in logic, syn- 
tax and user experience: Isabelle, followed by Mizar, CASL and Theorema. For 
each system at least one author has in-dcpth knowledge. The purpose of redoing 
formalisations from scratch is to understand the specific advantages and disad- 
vantages of the systems and to obtain as idiomatic a formalisation as possible. 
The formalisations and instructions for using them arc available from the ATT 
homepage [23] . Tab. 1 compares the features of the systems and their languages 
and shows the state of our work. The following subsections assess the languages 
and systems w.r.t. the technical requirements C* of §2. Tab. 2 at the end of this 
section summarises our findings to underpin our final recommendations. 

4.1 Level of Detail and Explicitness Required (req. Cla) 

All systems required greater detail and explicitness than the paper elaboration 
of §3.2. The Isabelle formalisation needs 3 additional definitions and 7 auxili- 
ary lemmas. Guiding the automated provers of Theorema and Hets and Mizar's 
proof checker required similar numbers of auxiliary statements, plus, in The- 
orema and Hets, further ones to emulate proof steps (cf. §4.2). However, first 
steps beyond Vickrey's theorem suggests that these auxiliaries make it easier to 
formalise further notions. As our work involved beginners and cxpertS"'^^, we can 
only approximately quantify the formalisation effort beyond the paper elabora- 
tion. The 'de Bruijn factor' [40], the formalisation size divided by the size of an 
informal T^^X source, measured after stripping comments and xz compression, is 
around 1.5 for all formalisations^" except Theorema^ ^. This observation suggests 
that machine formalisation is generally still harder than elaboration on paper. 

Even while explicit machine formalisation imposes tedious work on the au- 
thor, it can also prove beneficial. On paper, it was neither immediately obvious 
that exactly one participant wins a second-price auction, nor that the outcome 
is a function of the bids. While obvious that at least two participants are re- 
quired to define the 'second highest bid', the standard literature largely over- 
looks this, but formalisation forced us to choose whether to allow it (by, e.g., 
defining max0 = 0) or to explicitly require n > 2. 

4.2 Expressiveness vs. Efficiency (req. Cla) 

As discussed in §3.2, we did not strictly take the elaborated paper source as a 

specification for the formalisation, but wrote idiomatic formalisations. In Isa- 
belle and Mizar, we, e.g., avoided specific intervals {1, . . . , n} as sets of auction 

^® The Mizar formalisation was, e.g., completely written by an expert (Caminati), 
whereas the Isabelle formalisation was initially written by a first-time user with a 
general logic background (Lange), then largely rewritten by an expert (Wenzel). 
A typical average is 4, but our paper proof is particularly detailed. 
Determining a de Bruijn factor for Theorema does not make sense: single keystrokes 
or clicks may yield complex inputs, Mathematica notebooks store layout and main- 
tenance information, and Theorema caches proofs in the notebook (cf. §4.6). 



Table 1. Languages and systems we compared; state of our formalisations 



Language Logic Provcr User Interface Licence Fornialisation 



Isabelle/HOL HOL (simply- 


inter- 


document-oriented IDE (Isabelle/jEdit [39]) or program- 


BSD/LGPL/ 


complete incl. 


2013 [14] 


typed set theory) active" 


mer's text editor (Proof General Emacs [1]) 


GPL 


proof 


Theorema 


FOL 


auto- 


textbook-style documents, proof management GUI (add- 


GPL'' 


statements com- 


2.0 [46] 


+ set theory' 


mated ° 


on for Mathematica CAS) 




plete, no proof* 


Mizar 


FOL-'^ 


batch 


CLF; programmer's text editor (Emacs add-on) 


freeware/GPL-|- 
CC-BY-SA'' 


complete inch 


8.1.01 [11] 


+ set theory 


verifier 




proof 


CASL/ 


sorted FOL' 


auto- 


progr.'s text editor (Emacs add-on), proof mgmt. GUI-I- 


GPL 


complete incl. 


TPTP' [5] 




mated^ 


CLI (Hets 0.98'' [27]), web service (System on TPTP [34]) 




proof 



" Isabelle integrates internal and external automated provers. 
Theorema actually supports HOL. We, however, just needed FOL besides the built-in sets, tuples, and the max operator. 
For each goal, the prover can be configured individually. 

Theorema is under GPL but needs the commercial, closed-source Mathematica. Economists tend to be pragmatic about that. 
" Theorema is in transition to the new 2.0. Its architecture, inference engine, and user interface are fully implemented, but its collection 
of inference rules is still incomplete. Therefore, the proof does not yet work. 

^ Schemes permit a limited degree of higher-order reasoning. 

^ The verifier produces a list of numerical errors codes and their source file positions. The ancillary utilities errflag and addfmsg decorate 

source files with this information, and optionally append terse textual explanations of the relevant error codes. 

The Mizar proof checker is closed-source; the MML is free. 
' Common Algebraic Specification Language. 'CASL/TPTP' denotes our use of CASL as an input language for automated FOL provers 

(here: SPASS, E, Darwin) using the TPTP [32] exchange language. CASL features some second-order features, e.g. inductive datatypes. 
^ The proof is largely automatic. However, Vickrey's theorem is too complex to for automated proving in one step. Thus, the proof script 

introduces auxiliary lemmas and selects suitable axioms and provers for proving them. Proof times range from fractions of seconds if 

the exact list of axioms used is known beforehand to hours if not. However, once a proof is found, the prover can output the list of 

axioms used and thus speed up subsequent replays of the proof. 
*^ Heterogeneous tool set; gives access to a wide range of automated theorem provers. We use FOL provers, most of which share the 

unsorted TPTP EOF [32] as a common input format. Hets translates CASL to EOF by introducing auxiliary predicates for sorts. 



participants: arbitrary (finite) sets of natural numbers simplify the formalisa- 
tion, and the highest and second highest bids are determined using library set 
operations. In contrast, Theorema naturally indexes its built-in tuples from 1 to 
n and allows for restricting quantified variables to such ranges, e.g. V^^i, 

The CASL formalisation confirms the assumption of §3.3 that FOL suffices 
for expressing and proving the essence of Vickrey's theorem. For many FOL 
provers, CASL's (sub)sorts^^ are mere syntactic sugar but allow us to stay close 
to the domain language, speaking, e.g., of 'valuation vectors', each of which also 
is a valid 'bid vector'. Note that we have avoided using partial functions (e.g., for 
modelling out-of-scope vector indices) because of the complex logic translations 
required for coding them out. 

Isabelle and Mizar process the proof in a few seconds on a 2.5 GHz dual-core 
processor; Ifets/TPTP need about an hour; in Theorema it is not yet complete. 
We used rather weak HOL features, e.g., no synthesisation of functions. Coin- 
ciding with earlier, general observations on HOL [8], the low processing time 
suggests that there is no disadvantage in choosing a rich logic, which allows 
for expressing relevant concepts (such as maxima of finite sets of real numbers) 
naturally. Our formalisations' small size (less than 5 K after compression) does 
not yet warrant a precise quantitative judgement of time efficiency. Particularly 
for FOL there exist highly optimised automated provers. They arc conveniently 
accessible in Hets, via the System on TPTP [34] web service (accepting TPTP 
input that Hets can generate), but also from Isabelle/HOL via the Sledgehammer 
interface (see §4.3). Still, we observed a source of inefficiency in formalising for 
automated provers: the high share of preconditions with long conjunctions in our 
CASL formalisation makes it hard for the automated FOL provers to identify ap- 
plicable axioms. Such conjunctions result from the absence of structured proofs in 
CASL. This requires, whenever a theorem is too complex for automated proving, 
to 'emulate' proofs steps via auxiliary lemmas, whose antecedents are conjunc- 
tions of all relevant assumptions in the current branch of the proof tree. Perform- 
ance improvements by guiding provers through the search space can, however, be 
achieved with the extra effort of grouping frequently occurring conjunctions of 
assumptions into single abstract predicates, as in the following concrete case for 
the proof of Vickrey's theorem: spaWithTruthf ulOr Other Bid{n, x,p, v, S, i, b) <S4> 
s econdPrice Auction {n, x,p) A \v\ = \b\ = nAinRange{n,i) Abi ^ ViAb = b[i^v\. 

4.3 Proof Development and Management (req. C2a) 

The systems we studied offer different ways of invoking automated provers and 

keeping track of proof efforts in progress. The 'apparent' difference between auto- 
mated and interactive theorem proving blurs at a closer look. The interactive 
prover Isabelle features various automated proof methods; furthermore Sledge- 
hammer gives access to E, SPASS, and TPTP provers. One can configure the 
facts they should take into account (e.g. local assumptions and conclusions). For 



TPTP's typed first-order form (TFF [33]) is sorted, but without subsorts. We have 
not used it, as Hets cannot currently produce it from CASL. 



Mizar, there are also automated external tools (MPTP, MoMM, MizAR) [31]. 
Theorema's automated proving workflow is conceptually similar: specifying the 
knowledge to be used, then configuring the prover.^^ Hets users can select ax- 
ioms and previously proved theorems to be sent to an automated prover but have 
little control beyond. Isabelle's prover configuration is editable within the form- 
alisation source. Theorema stores it in hidden fields within the formalisation and 
exposes it via a dedicated GUI. Configuring proof tools in Hets is separate from 
the formalisation: the proof management GUI does not currently store settings 
persistently; however one can write scripts to be processed on the command line. 

Just as Isabcllc requires complex statements to be proved in multiple steps, 
involving different proof methods, the automated provers of Theorema^^ and 
Hets also require guidance by explicit configuration at times, as can be seen 
from the *.hpf proof scripts in our Hets formalisation [23]. Often, a theorem 
c : A C was too complex for automated proving, whereas the job could be 
done by a script that first proved auxiliary lemmas a : A ^ B and b : B C, 
possibly with different provers, and then proved c providing only a and b as 
axioms. This is conceptually the same as in Isabelle but has four significant 
user experience differences: 1. Each additional 'proof step' has to be stated as 
a lemma with full assumptions on the left hand side (similar to the example in 
§4.2), 2. CASL, originally a specification rather than a prover language, does 
not syntactically distinguish theorems from lemmas, 3. the scripts have to be 
maintained separately from the formalisation, and 4. a multi-step proof takes 
many seconds longer, as Hets translates the input theory from CASL to the 
respective prover's native language before each proof.^^ This gives a clear in- 
centive to eliminate unnecessary proof steps from a CASL formalisation. This 
experience also influenced our Isabelle formalisation, where writing multi-step 
proofs is comparatively painless. There, one lemma had a three-step proof, until 
experiments with the CASL formalisation made us attempt an automated proof. 
Thus wc realised that we could reduce the Isabelle proof to a single step.^^ 

Mizar differs by focusing, instead of built-in tactics and automated proof 
methods, on a natural deduction style which 'tries to "keep a low profile" in 
its logical foundations' and aims at 'clarity, human readability and closeness 
to standard mathematical proofs' [38]. Infiuenced by Mizar, the Isar language 
('intelligible semi-automated reasoning') replaced Isabelle's original tactic inter- 
face. In the name of its readability focus, Mizar deliberately prevents users from 
extending the verifier's power [38, §2.1], often forcing them to justify trivial pas- 
sages. Mizar's registrations do allow for custom automation [4]; however, these at 
times involute exploits often push registrations beyond their intended scope [20] 
and may result in implicit inferences and less readable proofs. 

Particularly in developing the proof of a theorem as complex as Vickrey's 
top-down, it is useful to defer proofs of lemmas or proof steps, as to use them 



For Theorema, a prover is a collection of inference rules applied in a certain strategy. 
This assessment relies on experience with Theorema 1. 

This is necessary as, by default, each successful proof adds one theorem to the theory. 
As it makes use of one definition and two lemmas, this was not obvious a priori. 



in a larger proof without the workaround of temporarily declaring them as ax- 
ioms. Theorema proofs can use unproved theorems as knowledge. Isabelle's sorry 
keyword creates a fake proof. CASL theorems are formulas with the annotation 
%inpliecl. When imported into a theory, (open) theorems become axioms, and 
Hets can use them without proof, but the open proof obligation is still visible in 
the imported theory. Mizar's verifier offers top-down proving for free by mark- 
ing unaccepted inferences as errors and then proceeding. This results in a formal 
proof sketch, 'very close to informal mathematical English' but still close to a 
fully formalised proof [41]. Furthermore, one can prefix the keyword proof with 
(3 to expressly and silently skip a proof, or disable the verifier on arbitrary code 
portions using pragmas. Mizar's Emacs mode exposes these as one-touch macros, 
which speeds up the verification process and improves interaction [38]. 

4.4 Library Coverage and Searchability (reqs. Clc, C2b) 

To a varying degree we have been able to reuse mathematical foundations from 
the systems' libraries. Isabelle can find reusable material by f ind_theorems quer- 
ies; Sledgehammer helps to extract a sufficient set of lemmas from the library, 
which is then minimised towards a necessary set. MML Query is a search engine 
for the MML [3]. CASL's library is searchable as plain text; Theorema's is not. 

Theorema has a built-in tuple type, including a maximum operation, wc used 
it to formalise bid vectors. The CASL library provides inductive datatypes such 
as arrays [29] but no n-argument maximum operation. The Isabelle/HOL library 
provides a Max operation on finite sets, and various Cartesian product types 
suitable for representing bids. Given Isabelle's functional programming syntax 
we found it, however, most intuitive to model our own vectors as functions 
N — > M evaluated up to a given n. Wrappers make the set maximum operator 
work on these vectors and prove the properties required subsequently. Our Mizar 
formalisation draws on generic relations and functions, which the MML richly 
covers. Thus, we only had to add a few interfacing lemmas. 

4.5 Term Input Syntax (req. Clb) 

Conversations with auction designers suggest that they find Theorema's term 
input syntax most accessible. The two-dimensional notation in Mathematica 
notebooks is similar to textbook notation, and our target audience is largely 
familiar with Mathematica. The syntax of Isabelle and CASL is closer to pro- 
gramming languages. Isabelle's functional type syntax f : A => B => C looks 
less closely related to textbook notation than CASL's f : A* B ^ C. Isabelle, 
CASL and Mizar allow for defining custom 'mixfix' operator notations. Isabelle 
provides rich translation mechanisms beyond that, but the layout remains one- 
dimensional, e.g. Va; e A. B(x) instead of Theorema's V B[x] for bounded 

quantification. Isabelle Proof General and Isabellc/jEdit approximate textbook 
notation by Unicode symbols. Isabelle, Mizar and Hets can export 1^1^. Mizar 
uses ASCII; its lack of binders makes mathematical concepts such as limits and 



sums cumbersome to denote [43]. A major reason for us not to cover the TPTP 
language is its technical, non-extensible ASCII syntax (using, e.g., ! /? for V/3). 

Theorema, CASL and Mizar support sharing common quantified variables 
across multiple statements, corresponding to the practice of starting a textbook 
section even of several axioms like 'let n, the number of participants, be a natural 
number > 1'. This helps to avoid redundancy but is prone to copy/paste errors. 
For example, our CASL formalisation has sections with global quantifiers Vi,j 
(e.g. to accommodate the maximum and second-price auction definitions of §3.2), 
but these include axioms that only use i. Literally pasting into this axiom an 
expression using j does not cause an error, as j is bound in the current scope as 
well, but changes the semantics of the axiom in a way hard to detect. 

4.6 Comprehensibility and Trustability of the Output (req. C2c) 

Machine proofs may 'succeed' for unintended reasons, e.g. accidentally stating 
a tautology such as an implication with an unsatisfiable antecedent. Or they 
succeed as intended, but the user cannot follow the (automated) deduction. In 
such situations the prover's output is crucial. Isabelle provides tracing facilities 
for simplification rules and introduction and elimination rules used in standard 
reasoning steps. Its inferenc;e kernel can produce a full record (usually large and 
unreadable) of the internal reasoning of automated tools via explicit proof terms, 
e.g. for independent checking. By default the kernel relies on static ML type- 
discipline to achieve correctness by construction, without explicit proof terms. 
Theorema's proof data structure captures the entire proof generation according 
to the rules and strategy selected. It can be displayed as a structured textbook- 
style proof with configurable verbosity, and visualised as a browsable tree that 
distinguishes successful from failed branches. Mizar 'just' verifies what the user 
wrote according to natural deduction rules, hence he is unlikely to doubt the res- 
ult. On the other hand, for the same reason, Mizar has no way to detect proofs 
succeeding for unintended reasons, and offers little help to a user clueless about 
a failing step. A correct Mizar proof can be improved by enhancer utilities [11, 
§4.6]: some report useful additional information (e.g., unneeded statements re- 
ferred in a step, unneeded library files, unneeded lemmas); others cut steps that 
a human might want to see, impacting readability and possibly the original con- 
fidence the user had in the proof. Hets uniformly displays the success of a proof 
and the list of axioms used; however the latter output is only informative with 
SPASS. Otherwise, the raw technical output of the prover is displayed, which 
strongly differs across provcrs. E.g., SPASS uses resolution calculus, which looks 
different from a textbook proof. Similarly, System on TPTP outputs performance 
measures and the status of the given problem (e.g. 'Theorem' or 'Unsatisfiable'), 
but otherwise the raw prover output. 

When a proof attempt fails because the statement was wrong, studying a 
counterexample may help. Isabelle has the Nitpick counterexample finder built 
in. Hets integrates several ones (Darwin is supported best [28]) and also employs 
them for consistency checking, as importing a theory whose axioms have no 
model results in vacuous truth. Both Isabelle and Hets can attempt a proof or 



otherwise try to find a counterexample in the same run. Theorema and Mizar 
do not support counterexamples. 

Before proving, all systems check whether the input is syntactically well- 
formed and well-typed. Isabelle/jEdit performs parsing, type checking and proof 
processing during editing, and attaches warnings and error messages like modern 
IDEs. The other systems require the user to explicitly initiate checking. Mizar 
and Hets check complete files, whereas in Theorema (which only checks syntax), 
one can individually check each notebook cell (typically containing one to a few 
statements). Mizar's verifier is particularly error resilient: it seldom aborts before 
the last input line, thus reporting errors for the whole file. 

4.7 Online Community Support and Documentation (req. C2d) 

Community support and documentation are major prerequisites for system ad- 
option. We assume that users with little previous mechanised reasoning and 
formalisation knowledge will seek low-threshold support from tutorial documents 
or mailing lists rather than attending community meetings - which, in theorem 
proving, so far focus on scientific/technical aspects rather than applications. 

We compare the community sizes, assuming that large communities are re- 
sponsive even to non-experts: Isabelle is developed at multiple institutions; its 
user mailing list gets more than 100 posts a month, with over 1000 different 
authors since 2000. CASL, an international standard, has been subject of hun- 
dreds of publications but does not currently have a mailing list. Hets is mainly 
developed and Tiscd within a single institution; its user mailing list receives less 
than 10 posts a month. Recalling that Hets is an integrative environment, users 
can also request help from the communities of TPTP (subject of more than 1000 
publications, no mailing list) and individual provcrs. Theorema is developed 
within a single institution and will not have a mailing list before the 2.0 release. 
Mizar is developed at one institution by a team that provides dedicated email 
user assistance: the 'Mizar User Service'. MML grows by 30-60 articles a year, 
with 241 contributors so far. The mailing list gets around 10 posts a month. 

Isabelle and CASL feature comprehensive tutorials and reference manuals, 
Hets has a user guide, Mizar offers tutorials [26]. Theorema has partial built-in 
help texts and is documented in a few publications. 

5 Related Work 

§1 mentioned earlier efforts to formalise economics. Particularly Arrow's im- 
possibility theorem, one of the most striking results in theoretical economics, has 
been a focus for formalisation efforts, including Nipkow's Isabelle and Wicdijk's 
Mizar formalisation [30, 42]. As in our case (cf. §3.2), they required initial paper 
elaboration; additionally, it helped them to identify omissions in their source [9]. 
This source states three alternative proofs, but Tang's/Lin's fourth, induction- 
based proof, allowed for obtaining insights on the general structure of social 
choice impossibility results using computer support [36] . 
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The formal verification technique of model checking has been applied to auc- 
tions. Tadjouddine et al. proved the strategy statement of Vickrey's theorem via 
two abstractions to reduce the model checker's search space: program slicing to 
remove variables irrelevant w.r.t. the property, and discretising bid values (e.g. 
'higher than someone's valuation Vi') [35]. Our formalisation is, to the best of our 
knowledge, the first for theorem provers; in the more expressive languages it has 
the comprchensibility advantage of preserving the structure of the original do- 
main problem. From earlier economics formalisation efforts cited above, it differs 
in its goal to (ultimately) help economists to use formal methods themselves. 

Our focus thus lies on comparing different provers by full parallel formalisa- 
tion. Wiedijk compared Isabelle/HOL, Mizar, Theorema, and 14 other provers 
by general, technical criteria, studying the code resulting from experts formal- 
ising a pure mathematics theorem {^/2 ^ Q), and comparing it to a detailed 
paper proof [44]. We complement this with the end user's perspective: our ob- 
servations, e.g., on the closeness of the input syntax to textbook notation or 
the comprehensibility of the output are general, but we emphasised these cri- 
teria as they are important to auction designers. Griffioen's/Huisman's 1998 
PVS and Isabelle/HOL comparison is, like Wiedijk's, independent from a spe- 
cific application but closer to ours in its look at systems' weaknesses from a 
user's perspective [12]. Like us, they rate proof management and user support, 
but go into more detail up to the 'time it takes to fix a bug'. Their findings on 
user interfaces have been obsoleted by progress in developing textbook-like proof 
languages and editors with random access and asynchronous validation. 

6 Conclusion and Outlook 

Auctions allocate trillions of dollars in goods and services every year, but their 
design is still 'far less a science than an art' [24]. We aim at making it a science 



by enabling auction designers to verify their designs. By parallel formalisation 
of the first major thcorcni in a toolbox for basic auction theory (ATT), we have 
investigated the suitability of four different theorem provers for this job, taking 
the perspective not only of experienced formalisers but also of our end users. 
Our contribution is 2 x 2-fold: 1. to auction designers wc provide (a) a growing 
library to build their formalisations on, and (b) guidelines on what systems to 
use; 2. to the CICM community we provide (a) challenge problems^^ and (b) user 
experience feedback from a new audience. This paper focuses on lb and 2b. 

For a concrete application, our findings confirm the widespread intuitions 
that formalisation benefits from an initial paper elaboration, that the 'auto- 
mated vs. interactive' distinction proves of little importance in practice, and 
that no single system satisfies all requirements. For now, our comparison results 
in Tab. 2 guide auction designers in choosing a system, given their formalisation 
requirements and experience. The ideal theorem proving environment would fea- 
ture a library as versatile as in Isabelle or Mizar, a prover as efficient as those 
of Isabelle or Mizar, giving error messages as informative as in Isabelle /j Edit, 
further a proof input language as close to textbook style as those of Isabelle or 
Mizar, or an interface to explore automated proofs as informative as Theorema's, 
a textbook-like term syntax as Theorema's, an integration of diverse tools as in 
Isabelle or Hets, and a community as lively as Isabelle's. We have not yet ex- 
ploited all strengths of the systems evaluated: maintaining a growing ATT with 
increasingly complex dependencies will benefit from stronger modularisation, as 
supported by Isabelle and even more so by the theory graph management of 
Hets/CASL. Regarding auction practice, we are working towards ways to check 
that formal definitions of auctions are well-defined functions ('for each admiss- 
ible bid input there is a unique outcome, modulo some randomness'). Given a 
constructive proof of this property, it should be possible to obtain verified pro- 
gram code that determines the outcome of an auction given the bids. This may 
work using Isabelle's code generator, but we will also explore provers based on 
constructive type theory. 

Broader conclusions about auction theory require further research. Bidding 
typically requires forming conjectures of others' beliefs, involving integration over 
conditional density functions (cf., e.g.. Proposition 13 in Maskin's review [24]). 
We expect that much of the required foundations should already be available 
in the libraries of Isabelle and Mizar. Maskin limits his review to single good 
auctions, noting that few general results exist for multi-unit and combinatorial 
auctions. Such auctions are often more economically critical (e.g. spectrum 
auctions, monetary policy [19]) but also more complicated. The real challenge 
for mechanised reasoning will be to demonstrate its use in this domain. 

Our problems are not currently challenging systems' performance but the promises 
of their languages and libraries. 

The last two chapters of [25] address multi-unit auctions; multi-unit and combinat- 
orial auctions are the focus of [7]. 

Even more ambitiously, many results in auction theory are simplified or extended 
by explicit application of mechanism design; cf. [17]. 
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